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Abstract 

Most nations of the world periodically publish N x N origin-destination tables, recording the 
number of people who lived in geographic subdivision i at time t and j at t + 1. We have de- 
veloped and widely applied to such national tables and other analogous (weighted, directed) so- 
cioeconomic networks, a two-stage-double-standardization and (strong component) hierarchical 
clustering-procedure. Previous applications of this methodology and related analytical issues are 
discussed. Its use is illustrated in a large-scale study, employing recorded United States internal 
migration flows between the 3,000+ county-level units of the nation for the periods 1965-1970 and 
1995-2000. Prominent, important features-such as "cosmopolitan hubs" and "functional regions"- 
are extracted from master dendrograms. The extent to which such characteristics have varied over 
the intervening thirty years is evaluated. 
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I. INTRODUCTION 



A. L. Barabasi, in his recent popular book, "Linked", asserts that the emergence of 
hubs in networks is a surprising phenomenon that is "forbidden by both the Erdos-Renyi 
and Watts-Strogatz models" [U p. 63] [21 Chap. 8]. Here, we indicate-and apply anew 
to extensive U. S. intercounty migration data-an analytical framework introduced in 1974 
that the distinguished computer scientist R. C. Dubes, in a review of the compilation of 
multitudinous results [3], asserted "might very well be the most successful application of 
cluster analysis" [U p. 142]. This two-stage methodology has proved insightful in revealing- 
in addition, to functional clusters-hub-like structures in networks of (weighted, directed) 
internodal flows. This approach, together with its many diverse socioeconomic applications, 
was documented in a large number of (subject-matter and technical) journal articles (among 
them [gigia Emilia El [121 liailll^ aswell as in the research 

institute monographs [3j, [23], [24]. It has also been the subject of various comments, 
criticisms and discussions [2H [251 12H [23 [2H1 [2H1 EOl ED E21 EH EH E51 ESI S3 EH] (cf. 
[321 USE!). 

Though this procedure is applicable in a wide variety of social-science settings j3[ H] , it 
has been primarily used, in a demographic context, to study the internal migration tables 
published at regular periodic intervals by most of the nations of the world. These tables 
can be thought of as iV x iV (square) matrices, the entries (m^) of which are the number of 
people who lived in geographic subdivision i at time t and j at time t + 1. (Some tables-but 
not all-have diagonal entries, ma, which may represent either the number of people who did 
move within area i, or simply those who lived in % both at t and t + 1. It can sometimes be 
of interest to compare analyses with zero- and nonzero- diagonal entries [23]. However, this 
aspect will not be of any immediate concern to us here.) We will principally be considering 
the case below of U. S. migration tables for the periods 1965-1970 [22] and 1995-2000 based 
on 3,000+ county-level units. 
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II. TWO-STAGE METHODOLOGY 



A. First Step: Double- Standardization of Raw Flows 

In the first step (iterative proportional fitting procedure [IPFP] [42 j ) , the rows and 
columns of the table of flows are alternately (biproportionally [13]) scaled to sum to a 
fixed number (say 1). Under broad conditions-to be discussed below-convergence occurs 
to a "doubly-stochastic" (bistochastic) table, with row and column sums all simultaneously 
equal to 1 [4*4"1 1431 |4*B"1 Wf\ HH1 HH] • The purpose of the scaling is to remove overall (marginal) 
effects of size, and focus on relative, interaction effects. Nevertheless, the cross-product 
ratios (relative odds), mijmkl ; measures of association, are left invariant. Additionally, the 
entries of the doubly-stochastic table provide maximum entropy estimates of the original 
flows, given the row and column constraints (501 EE]. 

For large sparse flow tables, only the nonzero entries, together with their row and column 
coordinates are needed. Row and column (biproportional) multipliers can be iteratively 
computed by sequentially accessing the nonzero cells [52J . If the table is "critically sparse" , 
various convergence difficulties may occur. Nonzero entries that are "unsupported" -that is, 
not part of a set of N nonzero entries, no two in the same row and column- may converge 
to zero and/or the biproportional multipliers may not converge [3j p. 19] [53] [5U p. 171]. 
The "first strongly polynomial-time algorithm for matrix scaling" was reported in [55] . 

The scaling was successfully implemented, in our largest analysis, with a 3, 140 x 3, 140 
1965-70 intercounty migration table-having 94.5% of its entries, zero-for the United States 
[9j [23], as well as for a more aggregate 510 x 510 table (with State Economic Areas as the 
basic unit) for the US for the same period [H]. (Smoothing procedures could be used to 
modify the zero-nonzero structure of a flow table, particularly if it is critically sparse [56"t l57]. 
If one takes the second power of a doubly-stochastic matrix [as we in fact do in Sec. V C , one 
obtains another such matrix-of predicted two-stage movements-but smoother in character. 
One might also consider standardizing the zth row [column] sum to be proportional to 
the number of non-zero entries in the ith row [column]-although we found considerable 
numerical difficulties when attempting this for the 1995-2000 U. S. intercounty migration 
table [Sec.|V]. Another procedure-in line with the Google page-ranking ["teleporting random 
walk"] procedure [SHUSH], that has been much studied and emulated-is to take some convex 
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combination of the doubly-stochastic table and the N x N table all the off-diagonal entries 
of which are equal to TyZT-) 

Figs, [I] and [2] give a graphic display of the effect of the double-standardization (bipro- 
portional adjustment) on a 1962-68 French interprovincial migration table (the island of 
Corsica is omitted). Additionally, Figs. [3] and [4] are comparable displays for 1995-2000 U. 
S. interstate migration (Alaska, Hawaii and the District of Columbia are not included) (cf. 
|60j). The process-employed by Waldo Tobler-to produce these four figures was the follow- 
ing: (1) the average value (A) of all the entries in the (adjusted or unadjusted) table was 
found; (2) if the sum of the ij and jz-entries of the corresponding table exceeded 2A, a bar, 
the thickness of which is proportional to this sum is drawn connecting area i to area j. We 
see that in the two figures based on the double-standardization procedure, linkages between 
adjacent areas are much more (suiting our purposes) strongly stressed than using the raw 
flows themselves. 

B. Second Step: Strong Component Hierarchical Clustering 

In the second step of the two-stage procedure, the doubly-stochastic matrix is converted 
to a series of directed (0,1) graphs (digraphs), by applying thresholds to its entries. As the 
thresholds are progressively lowered, larger and larger strong components (a directed path 
existing from any member of a component to any other) of the resulting graphs are found. 
This process (a simple variant of well-known single-linkage [nearest-neighbor or min] cluster- 
ing [61]) can be represented by the familiar dendrogram or tree diagram used in hierarchical 
cluster analysis and cladistics/phylogeny (cf. [62], [63] ) . (The "CLASSIC" methodology pro- 
posed somewhat later by Ozawa-though couched in rather different terminology-appears to 
be fully equivalent to ours. Ozawa found the procedure to be useful in "the detection of 
gestalt clusters" [62].) 

C. Computer implementations 

A FORTRAN implementation of the two-stage process was given in [M], as well as a 
realization in the SAS (Statistical Analysis System) framework [65]. Subsequently, the 
noted computer scientist R. E. Tarjan [66] devised an 0(M(log -/V) 2 ) algorithm [67] for strong 
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FIG. 1: Salient flows based on unadjusted 1962-68 French interprovincial migration table 

component hierarchical clustering, and, then, a further improved 0(M (log N)) method [68J, 
where N is the number of nodes and M the number of edges of a directed graph. (These 
substantially improved upon the earlier works [6U [65] , which required the computations 
of transitive closures of graphs-in terms of which the analysis of Ozawa [62] is phrased- 
and were O(MN) in nature.) A FORTRAN coding-involving linked lists-of the improved 
Tarjan algorithm [HE] was presented in [HH], and applied in the aforementioned 1965-70 US 
intercounty study [23]. If the graph-theoretic (0,l)-structure of a network under study is not 
strongly connected [70], independent two-stage analyses of the subsystems of the network 
would be appropriate. 

The goodness- -of -fit of the dendrogram generated to the doubly-stochastic table itself can 
be evaluated-and possibly employed, it would seem, as an optimization criterion (cf. [7H 
p. 210] [721 Sec. 3] [73J). Distances between nodes in the dendrogram satisfy the (stronger 
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FIG. 2: Salient flows based on doubly-standardized 1962-68 French interprovincial migration table. 
Note the increase in linkages between adjacent areas. 




1995-2000 Migration 

FIG. 3: Salient flows based on 1965-70 unadjusted U. S. interstate migration table 
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Bi proportioned Migration 

FIG. 4: Salient flows based on 1965-70 doubly-standardized U. S. interstate migration table. Note 
the increase in linkages between adjacent areas. 

than triangular) ultrametric inequality, d^ < max (dik, djk) [ZH p. 245] [751 ec L- (2-2)]. We 
will examine issues pertaining to ultrametric fit and residuals from such fits in Sec. |V A[ 
(Costa and colleagues have studied hierarchical aspects of "complex networks" [76| 177].) 

III. EMPIRICAL RESULTS 

A. Cosmopolitan or Hub-Like Units 

1. Internal migration flows 

Geographic subdivisions (or groups of subdivisions) that enter into the bulk of the den- 
drograms produced by the two-stage procedure at the weakest levels are those with the 
broadest ties. These are "cosmopolitan" , hub-like prototypical example being the 

French capital, Paris [31 Sec. 4.1] [BJ. Similarly, in parallel analyses of other internal migra- 
tion tables, the cosmopolitan/non-provincial natures of London [78], Barcelona [16] [3, Sec. 
6.2, Figs. 36, 37], Milan p2] Sec. 6.3, Figs. 39, 40] (cf. [13]), Amsterdam [31 p. 78] [25], 
West Berlin [3, p. 80], Moscow (the city and the oblast as a unit) [19] [31 Sec. 5.1 and Figs. 
6, 7], Manila (coupled with suburban Rizal) [79], Bucharest [18], fle-de-Montreal [31 p. 87], 
Zurich, Santiago, Tunis and Istanbul [80] were-among others-highlighted in the respective 
dendrograms for their nations [31 Sec. 8.2] [151 pp. 181-182] [8, p. 55]. In the 1965-70 inter- 
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county analysis for the US, the most cosmopolitan entities were: (1) the centra%-located 
paired Illinois counties of Cook (Chicago) and neighboring, suburban DuPage; (2) the na- 
tion's capital, Washington, D. C; and (3) the paired South Florida (retirement) counties 
of Dade (Miami) and Broward (Ft. Lauderdale) [HI [231 El]- In general, counties with large 
military installations, large college populations or state capitals also interacted broadly with 
other areas [23, p. 153]. Application of the two-stage methodology to 1965-66 London inter- 
borough migration [25J indicated that the three inner boroughs of Kensington and Chelsea, 
Westminster, and Hammersmith acted-as a unit-in a cosmopolitan manner Sec. 5.2, Fig. 
10]. (In Sec. 8.2 and Table 16 of the anthology of results [3J, additional geographic units 
and groups of units found to be cosmopolitan with regard to migration, are enumerated.) 

It should be emphasized that although the indicated cosmopolitan areas may generally 
have relatively large populations, this can not, in and of itself, explain the wide national ties 
observed, since the double-standardization, in effect, renders all areas of equal overall size. 
(However, to the extent that larger areas do have fewer zero entries in their corresponding 
rows and columns, a bias to cosmpolitanism may in fact be present, which should be carefully 



considered. Possible corrections for bias were discussed above in Sec. II A ) If one were to 
obtain a (zero-diagonal) doubly-stochastic matrix, all the entries of which were simply ttzt, 
it would indicate complete indifference among migrants as to where they come from and to 
where they go. A maximally cosmopolitan unit would be one for which all the corresponding 
row and column entries were -^-j- (if all the diagonal entries, ma, are a priori zero). (It seems 
interesting to note that cosmopolitan areas appear to have a certain minimax character, that 
is, the maximum doubly-stochastic entry for the corresponding row and column tends to be 
minimized.) 



2. Trade and interindustry flows 

The nation of Italy possessed the broadest ties in a two-stage analysis of the value of 
1974 trade between 113 nations, followed by a closely-bound group composed of the four 
Scandinavian countries [T7] [31 Sec. 5.6, Fig. 22]. In a two-stage study (but using weak 
rather than strong components of the associated digraphs) of the 1967 U. S. interindustry 
transaction table, the industry with the broadest (most diffuse) ties was found to be Other 
Fabricated Metal Products pU E2] [21 pp. 13-18]. 



8 



3. Journal citations 



In a two-stage analysis of 22 mathematical journals, the Annals of Mathematics and In- 
ventiones Mathematicae were strongly paired, while the Proceedings of the American Math- 
ematical Society was found to possess the broadest, most diffuse ties [E]. 

In a recent, large-scale (N > 6000) journal-to-journal citation analysis, decomposing "the 
network into modules by compressing a description of the probability flow", Rosvall and 
Bergstrom preliminarily omitted from their analysis the prominent journals Science, Nature 
and the Proceedings of the National Academy of Sciences [83, p. 1123]. (Those are precisely 
the ones that would be expected to be "cosmopolitan" or hub-like in character, and to be 
highlighted in a corresponding two-stage analysis.) Their rationale for the omission was that 
"the broad scope of these journals otherwise creates an illusion of tighter connections among 
disciplines, when in fact few readers of the physics articles in Science also are close readers 
of the biomedical articles therein". (In [2U pp. 125-153], we reported the results of a partial 
hierarchical clustering-not a two-stage analysis, but one originally designed and conducted 
by Henry G. Small and William Shaw-of citations between more than 3,000 journals. The 
clusters obtained there were compared with the actual subject matter classification employed 
by the Institute for Scientific Information.) 

B. Functional Clusters of Units 

1. Internal migration regions 

Geographically isolated (insular) areas-such as the Japanese islands of Kyushu and 
Shikoku [5] -emerged as well-defined clusters (regions) of their constituent (seven and four, 
respectively) subdivisions ("prefectures" in the Japanese case) in the dendrograms for the 
two-stage analyses, and similarly the Italian islands of Sicily and Sardinia [12], the North 
and South Islands of New Zealand, and the Canadian islands of Newfoundland and Prince 
Edward Island [3J p. 90] (cf. [Ml [85]). The eight counties of Connecticut, and other New 
England groupings, as further examples, to be elaborated upon below, were also very promi- 
nent in the highly disaggregated U. S. analysis [23]. Relatedly, in a study based solely upon 
the 1968 movement of college students among the fifty states, the six New England states 
were strongly clustered [TT1 Fig. 1]. Employing a 1963 Spanish interprovincial migration 
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table, well-defined regions were formed by the two provinces of the Canary Islands, and the 
four provinces of Galicia [IB] [31 Sec. 6.2.1, Fig. 37]. The southernmost Indian states of 
Kerala and Madras (now Tamil Nadu) were strongly paired on the basis of 1961 interstate 
flows [22]. A detailed comparison between functional migration regions found by the two- 
stage procedure and those actually employed for administrative, political purposes in the 
corresponding nations is given in Sec. 8.1 and Table 15 of [3]. 

It should be noted that it is rare that the two-stage methodology yields a migration region 
composed of two or more noncontiguous subregions-even though no contiguity information, 
of course, is explicitly present in the flow table nor provided to the algorithm (cf. [5T1 I8"6"]). A 
notable exception to this rule was the uniting of the northern Italian region of Piemonte-the 
location of industrial Turin, where Fiat is based-with (poor) southern regions, before joining 
with central regions, in an aggregate 18-region 1955-70 study [13] [31 P- 75] (cf. [12]). 

2. Intermarriage and interindustry clusters 

In a two-stage analysis of a 32 x 32 table of birthplace of bridegroom versus birthplace 
of bride of 1947 Australian intermarriages [87], Greece and Cyprus were the strongest dyad 
[31 Sec. 5.7, Fig. 25]. 

In the 1967 US interindustry two-stage [weak component) analysis, two particularly 
salient pairs of functionally-linked industries were: (1) Stone and Clay Products, and Stone 
and Clay Mining and Quarrying; and (2) Household Appliances and Service Industry Ma- 
chines (the latter industry purchases laundry equipment, refrigerators and freezers from the 
former) [101 182] [22 pp. 13-18]. 

IV. STATISTICAL ASPECTS 

It would be of interest to develop a theory-making use of the rich mathematical structure 
of doubly-stochastic matrices-by which the statistical significance of apparent hubs and 
clusters in dendrograms produced by the two-stage procedure could be evaluated [231 PP- 
7-8] [88] . In the geographic context of internal migration tables, where nearby areas have 
a strong distance-adversion predilection for binding, it seems unlikely that most clustering 
results generated could be considered to be-in any standard sense- "random" in nature. On 
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the other hand, other types of "origin-destination" tables, such as those for occupational 
mobility [EH], journal citations [E] [211 PP- 125-153], interindustry (input-output) flows [TU] 
[52"] . brand-switches [31 Sec. 9.6] [90], crime-switches [31 Sec. 9.7] [HH Table XII], and (Morse 
code) confusions [31 Sec. 9.8] [92], among others, clearly lack such a geographic dimension 
(cf. [93]). An efficient algorithm-considered as a nonlinear dynamical system-to generate 
random bistochastic matrices has recently been presented [16] (cf. [9H 195]). 

In the 1965-70 US 3,140-county migration study, a statistical test of Ling [96] (designed 
for undirected graphs), based on the difference in the ranks of two edges, was employed in 
a heuristic manner [23, pp. 7-8]. For example, the 3,148th largest doubly-stochastic value, 
0.12972 (corresponding to the flow from Maui County to Hawaii County), united the four 
counties of the state of Hawaii. The (considerably weaker) 7,939th largest value, 0.07340 
(the link from Kauai County, Hawaii, to Nome, Alaska), integrated the four-county state of 
Hawaii into a much larger 2,464-county cluster. (Data for the additional [fifth] very small 
county of Kalawao were only given in the 1995-2000 analysis.) The difference of these two 
ranks, 4,192 = 7,340 - 3,148, is a measure of isolation ("survival time") of this state as a 
cluster. Reference to Table 1 in [23] showed the significance of the state of Hawaii as a 
functional internal migration unit at the 0.01 level [231 p. 7]. (In the computation of this 
table, the approximation was used that the number of edges in the relevant digraphs was a 
negligible proportion of all possible 3, 140 x 3, 139 edges.) 

A. Random digraphs 

Also, the possibility of employing the asymptotic theory of random digraphs [971 EE] for 
statistical testing purposes was raised in [23] . In this regard, it was necessary to consider 
the 38,815-th largest entry of the doubly-stochastic matrix to complete the hierarchical 
clustering of the 3,140 counties. The probability is 0.973469 that a random digraph with 
3,140 nodes and 38,414 links is strongly connected [981 P- 361], where 0.973469 = e~ 2e 4,30917 ) 
and 38,814 = 3140(log3140 + 4.30917). Evidence of systematic structure in the migration 
flows can, thus, be adduced, since the digraph based on the 38,814 greatest- valued links 
was not strongly connected [231 p. 8] (cf. [99]). (F° r our 1995-2000 analysis [Sec. |V), the 
counterpart of the probability 0.973469 is 0.107134.) 

In a random digraph with a large number of nodes, the probability is close to one that 
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all nodes are either isolated of lie in a single ("giant") strong component. The existence of 
intermediate- sized clusters is thus evidence of non-randomness, even if such groups are not 
themselves significant according to the isolation (difference-of-ranks) criterion of Ling [96J. 
With randomly-generated data and many taxonomic units, one would expect the two-stage 
procedure to yield a dendrogram exhibiting complete chaining. So, although single- linkage 
clustering is often criticized for producing chaining, chains can also be viewed simply as 
indications of inherent randomness in the data. In contrast to single-linkage clustering, 
strong component hierarchical clustering can merge more than two clusters (children) into 
one (parent) node. This serves to explain why fewer clusters (2,245) were generated in the 
intercounty migration study than the 3,139 that single-linkage (in the absence of ties) would 
produce. 

B. A cluster-analytic isolation criterion 

Dubes and Jain |100] provided "a semi-tutorial review of the state-of-the-art in cluster 
validity, or the verification of results from clustering algorithms" . Among other evaluative 
standards, they discussed isolation criteria, which "measure the distinctiveness or separation 
or gaps between a cluster and its environment" . Such a statistic was developed and applied 
in |lUlj in order to extract a small proportion of 5,385 clusters (3,140 of them single units, 
673 pairs, 230 triples, 104 quartets,. . . ) for detailed examination based on the two-stage 
analysis of the 1965-1970 United States intercounty migration table [23J. 

The largest value of the isolation criterion, for all clusters of fewer than 2,940 units, was 
attained by a region formed by the eight constituent counties of the state of Connecticut. 
(Groups formed by the application of the two-stage procedure to interareal migration data 
strong rule, composed of contiguous areas [3J [15] • This occurs even in the absence 
of contiguity constraints, reflecting the distance decay of migration.) The 11,080th largest 
doubly-standardized entry, 0.05666, corresponding to movement from New Haven to (New 
York City suburban) Fairfield, unified these eight counties. Not until the 16,047th largest 
doubly-standardized value, 0.04085 (the functional linkage from Litchfield, Connecticut to 
Berkshire, Massachusetts), viewing the clustering procedure as an agglomerative one, was 
Connecticut absorbed into a larger region. The isolation criterion (i) for Connecticut is set 
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equal to 

(16047-11080) 



25.3175 = - log 



( (8 x 7 + 3132 x 3131)/(3140 x 3139)) (1) 



The term in large parentheses is the proportion of cells in the 3, 140 x 3, 140 table associated 
with either movement within (8-county) Connecticut or within the set of 3,132 complemen- 
tary counties (since intracounty flows are not available, a diagonal correction is made). This 
term, raised to the power shown, is the probability (unadjusted for occupied cells) that 
none of 4, 967 = 16, 047 — 11, 080 consecutive doubly-standardized values would correspond 
to movement between Connecticut and its complement. (In our 1995-2000 analysis [Sec. [V], 
we find analogously the result, 3,132= 12,107-8,975, yielding % = 16.1339, still the most 
signficant of any of the fifty-two 8-county clusters there.) Such a Connecticut-complement 
linkage could possibly result in a merger: an unobserved phenomenon. (For further details, 
including maps, discussion and extensive applications of the isolation criterion developed to 
the U. S. intercounty analysis, see |101j .) This isolation score (i) for the cluster formed by 
the four counties of Hawaii-discussed above-was 12.21, while the District of Columbia had 
the highest score, 23.81, for any single county |101[ Table I]. 



V. TWO-STAGE ANALYSES OF U. S. INTERCOUNTY MIGRATION FLOWS 

A 3, 107 x 3, 107 migration table for the United States for the period 1995- 
2000 can be readily constructed from freely available data at the website, 
http: / /www.census.gov/population/www/cen2000/ct ytoctynow/index.html[ We have been 
able to conduct a two-stage analysis of this table. In the 1965-70 analysis [23], 3,140 units 
of 3,141 had been utilized-with Loving County, TX, the smallest US county, being omitted 
since it had no recorded in-migrants. The reduction to 3,107 units in the 1995-00 table 
is due to the administrative amalgamation now of 34 independent cities of Virginia with 
neighboring counties. (Loving County is included in the later analysis, as well as now the 
second smallest-and poorest-US county, Kalawao County, HI.) The 1965-70 table was 94.5% 
sparse (zero entries), and the 1995-00 table, 92.3% sparse (the difference perhaps largely be- 
ing due to the Census sampling design). In Figs. [5] and [6j we present matrix plots of the 
unadjusted and adjusted tables. (The states are ordered alphabetically-not in terms of the 
postal ["zip"] code-and the counties, alphabetically within states. County No. 1 is Au- 
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FIG. 5: Unadjusted 1995-2000 intercounty U. S. migration table. The large square near the end-for 
alphabetical reasons-of the diagonal corresponds to the state with the most (254) counties, Texas. 

tauga, AL; County No. 1000 is Boyd, KY; County No. 2000 is Dunn, North Dakota; and 
County No. 3,107 is Weston, WY.) The largest square on the diagonal in both these fig- 
ures corresponds to the state with the most number of counties (254), that is, Texas. (The 
double-standardization-giving a more pronounced block-diagonal structure-brings out more 
strongly intrastate movements which would clearly tend to be favored over interstate ones 
due to effects of distance and possible state loyalties and ties.) 



A. Most cosmopolitan units 

In Fig. [7] we show the most cosmopolitan counties or groups of counties based on the 
doubly-standardized values themselves, while in (the less flat) Fig. |8j the ordinal rank of 
the doubly-standardized value is used instead. The doubly-standardized values associated 
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FIG. 6: Doubly-stochastic form of the 1995-2000 intercounty U. S. migration table 



with the evolution from beginning to end of the hierarchical clustering range from 0.530385 
to 0.0225427, while the ranks extend from 7 to 25,329. The comparable statistics for the 
1965-70 analysis based on 3,140 units were 0.47730 to 0.01659 and 24 to 38,815. So, ignoring 
any possibly necessary corrections due to the slightly different sizes (3,140 vs. 3,107) in the 
two periods and different degrees of sparsity (94.5% vs. 92.5%), one might conclude-since 
0.0225427 > 0.01659-that the most cosmopolitan counties in the earlier analysis were more 
so (less "provincial") than the most cosmopolitan counties in the later period. (For the 
choice of most appropriate locations at which to truncate dendrograms so as to distinguish 



cosmopolitan from provincial units, see Sec. VB3 



The non-truncated (master) versions of these two figures (along with their counterparts 
based on the [smoothed] second power or square of the doubly-stochastic table) are given 



in the Electronic-only material and will be examined in Sec. VB In Fig. [9] we show the 
ultrametric fit to the doubly-stochastic table generated by the hierarchical clustering pro- 
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Brevard, FL - 
Mohave, AZ 
Clark, NV - 
Hillsborough, FL 
Pinellas, FL - 



San Diego, CA 

Dallas, TX - 
Maricopa, AZ 
Cook, IL 



Monroe, FL - 
Cumberland, NC 
Mecklenburg, NC - 

□ 

Palm Beach, FL - 



Hennepin, MN - 
Marion, FL - 
Bell, TX 
Polk, FL - 
Citrus, FL 
Weld, CO - 
Larimer, CO 
Kenai Peninsula, AK - 
Los Angeles, CA - 
Pierce, WA 



3070 



FIG. 7: Truncated dendrogram-showing the most cosmopolitan and groups of cosmopolitan 
counties-based upon doubly-standardized 1995-2000 intercounty migration flows. To obtain a 
distance-like (dissimilarity) measure, we subtract the doubly-stochastic values from the largest 
such value, 0.530385 



cedure, and in Fig. 10, the residuals from this fit. As a measure of goodness-of-fit, let us 
take the ratio of the sum of squares of the residuals from the largest k = 25, 329 entries (the 
number needed to complete the hierarchical clustering process) to the sum of squares of the 
25,329 entries themselves. This ratio was 0.30861. In Fig. [TTJ we show this measure of fit 
as a function of k. The minimum (best fit) of 0.0964163 is reached for the 7,229-th largest 
doubly-stochastic entry, 0.0766761. 

The list of most cosmopolitan counties is much more "Sunbelt" -oriented in nature than 
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FIG. 8: Truncated dendrogram-showing the most cosmopolitan and groups of cosmopolitan 
counties-based upon ordinal rankings of doubly-standardized 1995-2000 intercounty migration 
flows 

in the 1965-70 analysis [23J discussed above. (Let us note a technical point: even though our 
strong component hierarchical clustering procedure can and often does unite more than two 
smaller clusters, in order to fit within the Mathematica hierarchical clustering framework, 
we have to map our results into an equivalent hierarchical clustering in which only binary 
mergers occur-though these may occur at equal thresholds. In our actual 1995-00 clustering, 
there were 2,497 mergers, as opposed to 3,106. The comparable figures for 1965-70 were 2,245 
and 3,139.) 

The leading cosmopolitan counties found (and some of their apparently migration-relevant 
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FIG. 9: Ultrametric fit to the 1995-2000 doubly-stochastic internal migration flow table (Fig. [6| 

features), in decreasing order, are: 

(1) Brevard, FL (the "Space Coast", the Kennedy Space Center); 

(2) Mohave, AZ (Lake Havasu, Grand Canyon); 

(3) Clark, NV (Las Vegas); 

(4) Hillsborough and Pinellas, FL, which are grouped with the pair (represented by 
the topmost box with "2" inside it), Pasco and Hernando, FL. (This quartet-having an 
isolation index of 11.9717-is completely coterminous with the government ally designated 
Tampa-St. Petersburg-Clearwater Metropolitan Statistical Area [MSA]. Additionally, Pasco 
and Hernando have the greatest isolation index, 14.6413, of any pair in the entire analysis 
[(Table [VB2] ); 

(5) The next lower box with "2" in it, stands for the southern Gulf Coast dyad formed 
by Collier County (East Naples) and Lee County (Fort Myers, a single-county MSA), FL; 

(6) San Diego, CA; 
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FIG. 10: Residuals (mostly negative) from the fit of the ultrametric structure (Fig. [9j to the 
1995-2000 doubly-stochastic table internal migration flow table (Fig. [6| 

(7) Dallas, TX; 

(8) Maricopa, AZ (Phoenix); 

(9) Cook, IL (Chicago); 

(10) Orange, Seminole, and Osceola, FL (corresponding to the upper box with "3" in it) 
(these three counties, along with Lake County, form the Orlando-Kissimmee MSA); 

(12) Sumter and Lake, FL (the next box with "2" in it); 

(13) Monroe, FL (Key West); 

(14) Cumberland, NC (giant Fort Bragg and Pope Air Force Base); 

(15) Mecklenburg, NC (Charlotte); 

(16) Martin, St. Lucie and Indian River, FL (the lower box containing "3") (Indian River 
borders Brevard County, the most cosmopolitan nationally); 

(17) Palm Beach, FL together with the pair Miami-Dade and Broward, FL (the lowest 
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FIG. 11: Goodness-of-fit of the ultrametric structure (Fig. [9| to the k largest 1995-2000 doubly- 
stochastic values. The best fit, 0.0964163, is attained at rank 7,229, corresponding to a doubly- 
stochastic value of 0.0766761. At that threshold, there are 388 clusters (strong components). 

box with "2" in it); (this southeastern Florida triad comprises the Miami- Fort Lauderdale- 
Pompano Beach MSA, highlighted in gray in the master dendrograms [Electronic-only ma- 
terial]); 

(18) Hennepin, MN (Minneapolis); 

(19) Marion, FL (bordering the (17) cluster on the north); 

(20) Bell, TX (Fort Hood); 

(21) Polk, FL (Lakeland); 

(22) Citrus, FL (formerly part of Hernando County); 

(23) Weld, CO (Greeley); 

(24) Larimer, CO (Fort Collins); 

(25) Kenai Peninsula, AK (Seward); 

(26) Los Angeles, CA; and 

(27) Pierce, WA (Fort Lewis and McChord Air Force Base). 

The comparable list for the earlier period 1965-70 takes the form [HU Table 1]: (1) Cook 
and DuPage, IL; (2) District of Columbia; (3) Dade and Broward, FL; (4) Pierce, WA; (5) 
Harris, TX (Houston); (6) Riverside and San Bernadino, CA; (7) Orange, CA; (8) Lake, 
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IL (lying in the Chicago metropolitan area); (9) Monroe, FL; (10) Los Angeles, CA; (11) 
Pinellas, FL; (12) Brevard, FL; (13) Polk, FL; (14) Pulaski, Mo (Fort Leonard Wood); (15) 
Geary, KS (Fort Riley); (17) Wayne, MI (Detroit); (18) Bell, TX; (19) Hillsborough, FL; 
(20) El Paso, CO (Air Force Academy); (21) Ventura, CA; (22) Cumberland, NC; (23) St. 
Louis County and City, MO; (24) Norfolk, VA (Atlantic Fleet headquarters); (25) Arlington 
County and Alexandria City, VA; and (26) Sedgwick, KS (Wichita, McConnell Air Force 
Base) . 

We see that this 1965-70 list is relatively weaker than the 1995-00 list in terms of Sunbelt 
counties, but relatively stronger in terms of counties with large military installations (though 
Brevard County, Florida does have Patrick Air Force Base). We note, in particular, that the 
first non-Sunbelt county in the 1965-70 list (that is, Cook, IL) is ninth here, but (coupled with 
DuPage, IL) was most cosmopolitan in the earlier analysis. Also, the District of Columbia 
(colored pink in the associated master dendrograms [Electronic-only material], p. 3), which 
had the lowest threshold of isolation of any single county in the 1965-70 analysis [231 p. 31], 
slips very substantially. 

The most immediate explanation for the relative decrease in cosmopolitanism of counties 
with large military installations would appear to be the elimination of the draft in 1973-so, 
it would seem, military installations became less relatively populated by transient, recently 
migrant individuals (draftees)-as well as the downsizing of the military since the Vietnam 
War. (The peak of 2.4 million troops was reached in 1969, while in 2000, there were some 
1.384 million military personnel.) 

B. Migration regions 

1. Selected features 

We find-in the first two (searchable) master (non-truncated) dendrograms (Electronic- 
only material) -that the states of Hawaii (red, i = 14.121, p. 2), Connecticut (blue, 
i = 16.1339, p. 2) and Rhode Island (green, i = 11.8384, p. 3) are reconstituted from 
their respective counties. (The most cosmopolitan county in Hawaii is Honolulu, and in 
Connecticut, Fairfield [a "bedroom suburb"] of New York City. Both Hawaii and Connecti- 
cut emerged as clusters in the 1965-70 analysis, while all the counties of Rhode Island, but 
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for historic Newport, were grouped.) In both analyses, the fifteen southern counties (colored 
black) of Maine are clustered (p. 8). (The northernmost, omitted county, Aroostook is agri- 
cultural, Canadian-oriented, and well-recognized as highly anomalous in terms of the general 
character of Maine [23l p. 48, pp. 118-119].) In the 1995-00 dendrograms (Electronic-only 
material), these fifteen counties immediately merge with six of the ten Lake Region counties 
of New Hampshire. The five counties of Rhode Island are also strongly linked with seven 
(or eight) Massachusetts counties (Table I). 

The five Pennsylvania counties (colored orange) of the Philadelphia-Camden- Wilmington 
MSA (p. 5) were grouped in both analyses, and similarly the four New York metropolitan 
counties (colored brown) of Long Island (p. 2). (Their isolation indices in the 1995-2000 
analyses are relatively weak, that is, 3.55119 and 2.50869, respectively.) 

In the 1965-70 analysis, the dyad forming first in the agglomerative process was comprised 
of the South Dakota counties of Dewey and Ziebach (p. 17), which together form the 
Cheyenne Indian Reservation. It is the sixth such couple in the later analysis, with the four 
pairs now forming first being: (1) Stewart and Webster, Georgia (p. 17) (these two counties 
will be found to have the largest associated diagonal entries when the doubly-stochastic table 



is squared in Sec. VC; (2) Garfield and Petroleum, Montana (p. 32); (3) the Eastern Shore 
of Virginia, that is, Accomack and Northampton Counties (p. 3 and Table I); (4) and Cassia 
and Minidoka Counties, Idaho, which form the Burley Micropolitan Statistical Area (p. 2 
and Table I). Further, the interstate Jackson Micropolitan Statistical Area-formed by Teton 
County, Idaho and Teton County, Wyoming-also comprises a strongly bound pair (p. 5). 
Our master dendrogams end with the pair of Alabama counties-lying in the Montgomery 
MSA-of Autauga and Elmore. 

The (strongly black-populated) Mississippi Delta is defined by Wikipedia as consisting 
of seventeen counties. Thirteen of these counties can be found in a certain fifteen-county 
1995-00 cluster (colored magenta, having i = 2.8686, p. 23). (In the 1965-70 analysis, we 
noted a six-county subcluster [231 P- 57].) The southernmost member of the seventeen- 
county group, Warren County (Vicksburg), is omitted from the thirteen-member cluster 
(along with Washington, Carroll and Holmes Counties). 

The San Joaquin Valley of California is defined by Wikipedia as comprised of seven 
counties. These seven, plus Madera County, are clustered (light green, p. 6). 

The six California counties that form the North Coast American Viticultural Area also 
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function as a migration region (light brown, p. 3). 

The three New York counties (Chautauqua, Cattaraugus and Allegany) forming the 
"Southern Tier" are highlighted in light blue (p. 7). 

2. Most well-defined 1995-2000 migration regions 

"South Jersey" is-according to Wikipedia-composed of eight New Jersey counties. With 
the omission of its most northern member (in fact, classified for governmental purposes as in 
the New York metropolitan area), Ocean County, the seven (Philadelphia-oriented) counties 
form a very well-defined migration region (colored light orange, i = 28.7301, p. 5, while for 
1965-70, i = 20.8996 [23J, p. 64] |101j ). In fact, arranged in terms of decreasing values of % 
for the period 1995-2000, this region emerges as the most well-defined in the entire analysis 
(Tables I-II). (Many of the values of i given in the tables for the 1965-70 period are available 
in |101j .) Since all the values of i for 1995-2000 listed are larger than — log jq^qq = 11.5129, 
we can infer that all these regions are significant at the 0.00001 level. 

French Louisiana is defined by Wikipedia as the amalgamation of Acadiana/"Cajun 
Country" (22 parishes) and Greater New Orleans (7 parishes)-St. Charles and St. John the 
Baptist being common to both-giving a 27-parish region. Our analysis yields a well-defined 
(i = 16.7764) 27-parish region also, having 24 parishes in common with the Wikipedia defi- 
nition. Our candidate region contains the three parishes, Allen, (the Mississippi-bordering 
pair of) East Feliciana and West Feliciana, but lacks those of Avoyelles (immediately adja- 
cent, however, in the dendrogram), Orleans (coextensive with the City of New Orleans) and 
St. Tammany (also in the New Orleans metropolitan area). (The last two parishes-located 
on pp. 9 and 15 of the first two master dendrograms-are relatively cosmopolitan-as might 
be anticipated from their wide [pre-Hurricane Katrina] renown.) 

The Northern New England region is composed of the three states of Maine, New Hamp- 
shire and Vermont, plus the two (mutually well-separated) Massachusetts counties of (west- 
ern) Berkshire and (northeastern) Essex. 

Our Northern Lower Michigan region is composed of twenty-six counties, twenty-two of 
which are contained in the twenty-seven county Wikipedia definition. Our region, however, 
extends further to the Southeast around Saginaw Bay, with Isabella, Midland, Bay and 
Saginaw counties, and omits five of Wikepedia's southwesternly situated ones (Wexford, 
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Region 


States 


Page 


no. counties 


i (1995-00) 


i (1965-70) 


South Jersey 


NJ 


6 


7 


28.7301 


20.8996 


Glades + Hendry + Okeechobee 


FL 


1 


3 


23.474 




"Delmar" + Baltimore 


DE,MD 


5 


15 


20.283 




Western Ohio + Randolph, IN 


OH,IN 


25 


14 


20.0938 




Western New York 


NY 


7 


18 


19.4948 




Rhode Island + S. E. Mass. 


RI,MA 


3 


12 


18.6991 




Greater Orlando 


FL 


1 


3 


17.6523 




Northern Lower Michigan 


MI 


8,9 


26 


17.2098 




French Louisiana 


LA 


30,31 


27 


16.7764 




Brevard 


FL 


1 


1 


16.3097 


19.6942 


Golden Triangle (Beaumont +) 


TX 


4 


6 


16.1803 




Connecticut 


CT 


2 


8 


16.1339 


25.3175 


Mohave (Kingman) 


AZ 


1 


1 


15.463 


6.39121 


Clark (Las Vegas) 


NV 


1 


1 


15.1784 


6.23128 


Rexburg, ID + Jackson, WY MSAs 


ID,WY 


5 


4 


15.0882 




Eastern Rust Belt 


NJ,OH,PA,WV 


24 


82 


15.0412 




Burley MSA 


ID 


2 


2 


14.8809 




Pasco + Hernando 


FL 


1 


2 


14.6413 




San Diego 


CA 


1 


1 


14.2408 


12.5938 


Maysville MSA + 3 counties 


KY 


19 


5 


14.1822 




Hawaii 


HI 


2 


5 a 


14.121 


12.21 


Northern High Plains 


MT,ND,NE,SD 


36,37 


55 


13.8799 




Middle Ohio Valley 


IN,KY 


24,25 


27 


13.821 




Eastern Shore 


VA 


3 


2 


13.7051 





TABLE I: Most well-defined 1995-2000 migration regions and their isolation indices 
a A fifth county, Kalawao, was included in the 1995-00 data, but not in 1965-70 
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Region 


States 


Page 


no. counties 


i (1995-00) 


i (1965-70) 


Dallas 


TX 


1 


1 


13.5473 


14.8557 


Maine + 7 NH counties 


ME,NH 


8 


22 


13.4716 




Southeastern Arizona 


AZ 


2 


3 


13.3503 




Maricopa (Phoenix) 


AZ 


1 


1 


13.2608 


12.5479 


Eastern Upstate New York 


NY 


7 


28 


13.3052 




Michigan Thumb 


MI 


6 


6 


13.2208 




Wasatch Back 


UT 


11 


8 


13.1616 




N. Vermont + Coos, NH 


NH,VT 


11 


10 


13.0778 




S. Central Tennessee 


TN 


22 


10 


13.3092 




Northeast South Carolina 


SC 


15 


8 


13.0276 




Northern New England 


MA,ME,NH,VT 


9,10 


42 


12.8446 




Cook (Chicago) 


IL 


1 


1 


12.7682 


16.8933 


Southeastern Indiana 


IN 


25 


10 


12.7172 




Northwestern Lower Michigan 


MI 


9,10 


9 


12.6567 




High Colorado Rockies 


CO 


3 


3 


12.5892 




Joplin Area 


MO 


5 


3 


12.3071 




Central Savannah River 


GA 


22 


4 


12.2086 




Southern Maryland 


MD 


3 


3 


12.1217 




Amarillo (Potter + Randall) 


TX 


1 


2 


12.0528 


8.16948 


Tampa MSA 


FL 


1 


4 


11.9717 




York+Adams 


PA 


3 


2 


11.9433 


13.7789 


Lake + Sumter 


FL 


1 


2 


11.8635 




Rhode Island 


RI 


3 


5 


11.8384 


11.7668 a 


Central Appalachia 


MD,NC,TN,VA,WV 


27,28 


77 


11.7459 





TABLE II: Most well-defined 1995-2000 migration regions and their isolation indices (cont.) 
"Newport County was not directly clustered with the other four counties of the state in 1965-70 
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Missaukee, Osceola, Lake and Mason). 

Adams County, PA was created from part of York County, PA (Table II). 

We did omit from Table I the rather anomalous twelve-county, four-state (ID, OR, UT, 
WA) cluster found on p. 19 of the dendrograms, even though it has % = 18.1505. It is 
essentially composed of two rather remote noncontiguous (OR-WA and ID-UT) sets of areas 
united by the links Clark, ID — > Sherman, OR (having a doubly-stochastic value of 0.270946, 
the 261-st largest) and Skamania, WA — > Bear Lake, ID (0.135147, the 2,655-th largest). 
(We have no immediate explanation for these apparently surprisingly relatively large values.) 



3. Cosmopolitan/Provincial Boundary 

There is a large 2,423-county cluster {lacking all of the New England and Hawaiian 
counties)-having the high value i = 27.7726-stretching in the dendrogram from Navarro, 
TX (p. 9) until the very end (Autauga, AL). (This might be considered to be a domain 
of lesser cosmpolitan counties or groups of counties.) It is a subcluster of a 2,588-county 
cluster (i = 27.7304) stretching from Quay, NM (p. 7), again to the end. Still larger, but 
somewhat weaker, is a 3,069-county cluster, i = 20.2582, extending from Salt Lake, UT [ 
p. 1] until the end. Further, starting with Wayne, NC, but excluding Androscoggin, ME 
(p. 8), there is a 2,483-county cluster extending to the end with i = 19.7518. If we were to 
maps such results, the cosmopolitan counties-it would seem-would comprise "archipelagos" 
in the "sea" of provincial counties. 



C. Master dendrograms based on the square of the doubly-stochastic table 

Matrix multiplying the doubly-stochastic form of the 1995-2000 (zero-diagonal) inter- 



county migration table by itself, we obtain another doubly-stochastic table (Fig. 12), but 
now one with non-zero diagonal entries-which ranged from a high of 0.314164 and 0.3060604 
for the members of a pair of small Georgia counties-Webster and Stewart (we recall that 
these two counties were the first cluster formed in the hierarchical [agglomerative] process- 
to lows of and 0.0000253087 for the Hawaii counties of Kalawao and Kauai, respectively. 
(Kalawao County-once the site of a leprosy colony-was in many respects anomalous be- 
cause of its very small size, and might in retrospect been readily omitted from the analyses. 
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FIG. 12: Square of the doubly-stochastic form (Fig. [6]) of the 1995-2000 intercounty U. S. migration 
table. Only 2.82% of the entries of the matrix are 0, while 92.3% are in the unsquared matrix 

Of course, if a county has a large associated diagonal entry in the newly derived doubly- 
stochastic table, its off-diagonal entries, which are the only ones which affect its clustering 
properties, will tend to be reduced in size.) The resulting master dendrograms (presented in 



the Electronic-only material, along with the ultrametric [ordinal] fit in Fig. 13, the residuals 



from this fit in Fig. 14 and the goodness-of-fit measure in Fig. 15 (-again employing the 



strong component hierarchical clustering methodology-are even more biased in cosmopoli- 
tanism to Sunbelt counties. (The largest 163,341 doubly-stochastic values were required to 
complete the strong component hierarchical clustering, much more than the 25,329 needed 
in the original [unsquared] analysis.) 

There are obviously many interesting significant features in these figures, as the many 
long strings of counties within single states, apparent upon examination, would indicate. 
The isolation index for the seven-county "South Jersey" migration region (p. 7) has now 
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FIG. 13: Ultrametric fit to the square of the 1995-2000 doubly-stochastic internal migration flow 



table (Fig. 12 ) 



climbed to 49.8337, while the five clustered counties of Hawaii (p. 12) have an index of 
34.4687. There is a still higher index of 64.2316 for a 39-county region (pp. 6-7) composed 
of all the counties of Maine, New Hampshire and Vermont, but for Bennington County, VT 
(p. 6), which is adjacent to New York and clustered with counties of that (non-New England) 
state instead. This 39-county tri-state region is included along with Eastern Upstate New 
York counties in a 66-county region (with a very high % = 118.237). There was now also a 7- 
county Connecticut region (lacking New York City "bedroom" -suburban Fairfield County), 
having i = 32.913 (p. 5). 

Additionally, to identify some of the other prominent clusters, we have a 15-county New 
York region (i = 34.0272, p. 13), a 12-county Arkansas region (i = 32.9012, pp. 8-9), a 
20-county (northeastern) North Carolina region (i = 32.5555, pp. 14-15), a 32-county Ohio 
region (i = 29.8344, pp. 18-19), a 16-county (western) South Carolina region (i = 29.9287, 
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FIG. 14: Residuals (mostly negative) of the fit of the ultrametric structure (Fig. 13) to the square 



of the 1995-2000 doubly-stochastic table internal migration flow table (Fig. 12 ) 



p. 13), an 83-county NJ-NY-PA-WV ("Rust Belt") region (i = 26.9231, pp. 12-13), a 
17-county tri-state "Delmarva" region (i = 26.1837, p. 13), and a joining of seven northern 
Florida counties with seventy-seven of Georgia (i = 24.5441, pp. 29-30). 



D. Use of teleporting random walk 

Motivated by the widespread interest in and emulation of the PageRank algorithm used 
by popular search engines such as Google [HE1 EH], we took a weighted combination of 
the doubly-stochastic 1995-2000 U. S. intercounty migration table and the zero-diagonal 
3,107 x 3,107 doubly-stochastic table with all its off-diagonal entries equal to t^t. A 
weight of 0.9 was applied to the former table, and 0.1 to the latter table. 

Again, precisely (the largest) 25,329 entries of the resultant table were needed to com- 
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FIG. 15: Goodness-of-fit of the ultrametric structure to the k largest values in the square of the 
1995-2000 doubly-stochastic table. The best fit, 0.130718, is attained at rank 12,603, correspond- 
ing to a doubly-stochastic value of 0.0206195. At that threshold, there are 377 clusters (strong 
components). 

plete the (strong component) hierarchical clustering process. We noted that "South Jersey" 
was again a 7-county cluster with precisely the same isolation index of 28.7301 (Table I). 
Overall, our original clustering appeared to be totally robust in its qualitative features to 
the effect of the teleporting random walk, at least with the particular weights (0.9,0.1) we 
employed. Potentially, if the square of the doubly-stochastic table were similarly teleported, 
the associated hierarchical clustering results might not be so robust. 

VI. CONCLUDING REMARKS 
A. Aggregation issues 

One might-using the indicated two-stage procedure-compare the hierarchical structure of 
geographic areas using internal migration tables at different levels of geographic aggregation 
(counties, states, regions...) (cf. [93J). To again use the example of France, based on a 
21 x 21 interregional table for 1962-68, Region Parisienne was the most hub-like [3j Sec. 4.1] 
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[6], while using a finer 89 x 89 interdepartmental table for 1954-62, the dyad composed of 
Seine (that is Paris and its immediate suburbs) together with the encircling Seine-et-Oise 
(administratively eliminated in 1964) was most cosmopolitan [7] [31 Sec. 6.1]. (In [H3], " two 
distinct approaches to assessing the effect of geographic scale on spatial interactions" were 
developed.) 

We, in fact, can directly compare the results of our U. S. 1965-70 migration between 3,140 
counties study [23] with a highly detailed study [H] for the very same period conducted on 
the more aggregate level of 510 State Economic Areas (SEAs, collections of counties). In 
terms of relative cosmopolitan characteristics, the list based on the SEAs does have some 
different emphases than that given above in terms of the counties. According to Fig. 2 
of [Ej, the most cosmopolitan SEAs, in decreasing order, were Alaska; Hawaii (2 SEAs); 
Southeast Florida (3 SEAs); Southwest Florida; North Florida; the Chicago SMSA; the 
New York SMSA; Norfolk- Portsmouth SMSA; San Bernadino and Riverside SMSA; the 
District of Columbia; and the Maryland suburbs of D. C. (2 SEAs). (As previously noted, 
in the county-level 1965-70 analysis, the two most cosmopolitan entities were the Chicago 
metropolitan pair of Cook and DuPage, and the District of Columbia-while in the 1995-00 
intercounty analysis, Brevard, FL and Mohave, AZ played these roles.) Let us also bring 
to the reader's attention a 2005 discussion paper in which the 1995-2000 U. S. interstate 
migration table is studied using both double-standardization and "social network analysis" 
[50] . (Hierarchical clustering is also employed, but apparently not that form based on strong 
components.) 

B. Max-flow/Min-cut application 

In |102j . Newman applied the famous Ford-Fulkerson max-flow/min-cut theorem [103, 
Chap. 22] to weighted networks (which he mapped onto unweighted multigraphs) . Earlier, 
this theorem had been used to study Spanish [85], Philippine |104j . and Brazilian, Mexican 
and Argentinian |105j internal migration, US interindustry flows [2U PP- 18-28] [106] [82l 
Sec. Ill] and the international flow of college students [21] (cf. |107j )-all the corresponding 
flows now being left unadjusted, that is not (doubly- nor singly-) standardized. 

In this "multiterminal" approach, the maximum flow and the dual minimum edge cut- 
sets, between all ordered pairs of nodes are found. Those cuts (often few or even null in 
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number) which partition the N nodes nontrivially-that is, into two sets each of cardinality 
greater than 1-are noted. The set in each such pair with the fewer nodes is regarded as a 
nodal cluster (region, in the geographic context). It has the interesting, defining property 
that fewer people migrate into (from) it, as a whole, than into (from) its node. In the Spanish 
context, the (nodal) province of Badajoz was found to have a particularly large out-migration 
sphere of influence, and the (Basque) province of Vizcaya (site of Bilbao and Guernica), an 
extensive in- migration field [85]. In an analysis of 1967 US interindustry transactions based 
on 468 industries, among the industries functioning as nodes of production complexes with 
large numbers of members were: Advertising; Blast Furnaces and Steel Mills; Electronic 
Components; and Paperboard Containers and Boxes. Conversely, among those serving as 
nodes of consumption complexes were Petroleum Refining and Meat Animals [82l I106j . 



C. Subdominant eigenvalue 

Pentney and Meila have extended spectral clustering algorithms to "asymmetric affini- 
ties" |108j . In line with their approach, we computed the subdominant eigenvalue (0.906253) 
of the 3, 107 x 3, 107 doubly-stochastic 1995-2000 intercounty migration table, and the as- 
sociated eigenvector. (Trivially, the dominant eigenvalue is 1, and the components of the 
corresponding eigenvector all equal.) Interestingly, the largest (most positive) seventy-eight 
components of this vector all corresponded to counties of Georgia, while the smallest (most 
negative) one hundred and ten components were all from one or another of the contiguous 
triad of Great Plains states, North Dakota, South Dakota and Nebraska. (The most neg- 
ative two values are for Dewey and Ziebach Counties of South Dakota, which as we have 
previously indicated form the Cheyenne Indian Reservation, and was the first cluster to 



form in the 1965-70 [agglomerative] hierarchical clustering |23j.) In Fig. 16 we present a 
"list plot" of these components. (The counties are listed alphabetically within states, and 
the states themselves alphabetically also.) A rough gestalt estimate might yield some thirty 
to forty clusters. Dorogovtsev and Mendes have reviewed "the recent rapid progress in the 
statistical physics of evolving networks" [109J. 
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FIG. 16: Components of the subdominant eigenvector of the doubly-stochastic form of the 1995- 
2000 U. S. intercounty migration table. The highest-situated evident cluster is composed of counties 
of the state of Georgia 

D. U. S. internal migration network 

We have presented (Electronic material) master dendrograms descriptive of the rich 
geographical and sociological evolving tapestry of the United States-as reflected in the 
1965-1970 and 1995-2000 migration flows between the 3,000+ county-level units. Our re- 
sults have been derived using a demonstratedly-insightful two-stage methodology-double- 
standardization of the recorded flows followed by (strong component) hierarchical clustering- 
applicable to (weighted, directed) socioeconomic networks, in general. Applying a graph- 
theoretic isolation criterion, we extracted particularly distinct large multicounty migration 
regions, well describable as "French Louisiana", "Northern Lower Michigan", "Northern 
New England", et al. Certain tightly-knit functional clusters-for example, the states of 
Connecticut, Hawaii, as well as "South Jersey" -are invariant over the thirty-year study pe- 
riod. Broad "cosmopolitan" or "hub-like" migration to and from "Sunbelt" counties (Clark 
County, Nevada [Las Vegas], for instance) became relatively more conspicuous and migration 
associated with counties with large military installations (Pierce County, Washington [Fort 
Lewis and McChord Air Force Base], for example), less so. Further, the most cosmopolitan 
units for 1965-70 (the paired Chicago metropolitan counties of Cook and DuPage, Illinois, 
and the District of Columbia heading the list) were more cosmopolitan in character than 
the leading ones in the later analysis. We supplemented these analyses by studying both 
the square and a "teleported random walk" form of the doubly-stochastic table, as well as 
its subdominant eigenvalue. The hierarchical clustering obtained is robust against telepor- 
t at ion. 
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